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Hidden Markov Processes (HMP) is one of the basic tools of the modern probabilistic modeling. 
The characterization of their entropy remains however an open problem. Here the entropy of HMP 
is calculated via the cycle expansion of the zeta-function, a method adopted from the theory of 
dynamical systems. For a class of HMP this method produces exact results both for the entropy 
and the moment-generating function. The latter allows to estimate, via the Chernoff bound, the 
probabilities of large deviations for the HMP. More generally, the method offers a representation of 
the moment-generating function and of the entropy via convergent series. 
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I. INTRODUCTION. 



Hidden Markov Processes (HMP) are generated by a Markov process observed via a memory-less noisy channel. 
They are widely employed in various areas of probabilistic modeling [1 0i H [H : information theory, signal processing, 
bioinformatics, mathematical economics, linguistics, etc. One of the main reasons for these numerous applications 
is that HMP present simple and flexible models for a history-dependent random process. This is in contrast to the 
Markov process, where the history is irrelevant, since the future of the process de pen ds on itspresent state only. 

Much attention was devoted to the entropy of HMP 0, @, 0, !, H, E! ED, d Ell El El El • It characterizes the 
information content (minimal number of bits needed for a reliable encoding) of HMP viewed as a probabilistic source 
of information. More specifically, the realizations generated in the long run of a random ergodic process, e.g. HMP, 
are divided into two sets d |1[ . The first (typical) set is the smallest set of realizations with the overall probability 
close to one. The rest of realizations are contained in the second, low-probability set. Now the entropy characterizes 
the number of elements in the typical set [1 . When HMP is employed as a model for information transmission 
over a noisy channel, the entropy is still important, since it is the basic non-trivial component of the channel capacity 
(other components needed for reconstructing the channel capacity are normally easier to calculate and characterize). 
However, there is no direct formula for the entropy of HMP, in contrast to the Markov case where such a formula is 
^T) . well-known 0,0,0. Th us peop le studied the entropy via expansions around various limiting cases, or via upper and 



^- ' lower bounds [1 El El EI El El, El Ef 



via the solution of an integral equation 



17L There is also a general formalism that expresses the entropy of HMP 
This formalism is however relatively difficult to apply in practice. 
Once the entropy characterizes the number of typical long-run realizations, it is of interest to estimate the probability 
of atypical realizations. These estimates are standardly given via the moment-generating function of the random 
. process H§- The knowledge of this function also allows to reconstruct the entropy [IQ- 

This paper presents a method for calculating the moment-generating function of HMP. The method is adopted from 
the theory of chaotic dynamical systems, where it is known as the cycle expansion of the zeta-function [25|, (27|. We 
show that in a certain class of HMP one can obtain exact expressions for the moment-generating function and for the 
entropy. For other cases the method offers analytic approximations of the moment-generating function via convergent 
- - > power series. 

We attempted to make this paper self-contained and organized it as follows. Section [IT] defines the HMP, settles 
some notations, and recalls how to express the probabilities of HMP via a random matrix product. In section HTT1 we 
briefly review the main facts about the entropy of an ergodic process and the corresponding typical (highly probable) 
set of realizations. The main purpose of section IIVI is to relate the entropy of HMP to the spectral radius of the 
corresponding random matrix product. This is done via the Lyapunov exponent of the random matrix product. 
Section [V] discusses the moment-generating function of HMP. This function is employed (via Chernoff bounds) for 
characterizing the atypical (improbable) realizations of HMP. Section IVII shows how to calculate the entropy and 
the generating function via the zeta-function and the periodic orbit expansion. Section I VIII discusses one of the 
simplest examples of HMP and presents exact expressions for its entropy and the moment-generating function. Here 
we also apply the moment-generating function for estimating atypical realizations of the HMP. Section I Villi studies 
another popular model for HMP, binary symmetric HMP. It is shown that the presented approach reproduces known 
approximate results and predicts several new ones. The last section shortly summarizes the obtained results. Some 
issues, which are either too technical or too general for the present purposes, are discussed in Appendices. 
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II. DEFINITION OF THE HIDDEN MARKOV PROCESS. 



In this section we recall the definition of the Hidden Markov Process (HMP); see jl], HJ for reviews. 

Let a discrete-time random process S — {Sq ,Si,S2, ■■■} be Markovian, with time-independent conditional probability 

Pr[5 fe = s k \S k -i = Sfc_i] = Pr[<S fe+ / = s k \S k -i+i = Sfe-i] = p(s k \s k -i), (1) 

where I is an integer. Each realization s of the random variable S takes values s = 1, L. The joint probability of 
the Markov process reads 

l 

Pr[S N = s N , ...,S = s ] = p(sAr|sAr-i) ■ • ■p{si\s )p(s ) = p(s k \sk-i)p(so), (2) 

k=N 

where p(sq) is the initial probability. The conditional probabilities p{sk\ s k-i) define the L x L transition matrix P: 

p s fcSfc -i =p(sk\sk-i). (3) 
We assume that the Markov process S is mixing [l8|: it has a unique stationary distribution p s t(s), 

L 

5>(*|s')ftt(s')=ftt(s), (4) 

s'=l 

that is established from any initial probability in the long time limit. The transition matrix P has always one eigenvalue 
equal to 1 [since P has a left eigenvector (1, 1)], and the modules [absolute values] of all other eigenvalues are not 
larger than one . The mixing feature however demands that the eigenvalue equal to 1 is non-degenerate and the 
modules of all other eigenvalues are smaller than 1 A sufficient condition for mixing is that all the conditional 
probabilities p(s i+ i\si) of the Markov process are positive [H[ 2 . Taking p(s) — p st {s) in @ makes the process S 
stationary. 

Let random variables X{, with realizations Xi = 1, .., M, be noisy observations of Si\ the (time-invariant) conditional 
probability of observing X{ = xi given the realization Si = Si of the Markov process is Tr(xk\sk)- The joint probability 
of the original process and its noisy observations reads 



P(sn, ■ ■ ■ ,s ;xn, ■ ■ ■ ,xi) = ]~J Tr{xk\s k )p(s k \sk-i)pst(so) (5) 

= T SN SN _ 1 (x N )...T SlSo (x 1 )p st (s ), (6) 
where the L x L transfer-matrix T(x) with matrix elements T Si Si _ 1 (x) is defined as 

Tsis^ix) = Tr(x\si)p(si\si-i). (7) 

Thus X — {Xi, X%, ...}, called hidden Markov process, results from observing the Markov process S through a 
memory-less process with the conditional probability 7r(ir|s). The composite process SX is Markovian as well. 

The probabilities for the process X are represented via the transfer matrix product (similar representation were 
employed in [THIl^]) 

P(x w ...i) = (un|T(x Ar ... 1 )|st), (8) 
l 

T( Xjv ...i) ee [] T(x k ), (9) 

k=N 

xjv...i = (x N , ... ,xi), (10) 



1 Indeed, ¥ik%k = implies | y\ fih^k I 5~ ^ik \ x k \ = \ v \ \ x i\ i which then leads to \v\ < 1. 

2 Weaker sufficient conditions for mixing are that i) for any there exists a positive integer rriij such that (P m »J ! )y > 0, i.e., for 
some power of the matrix its entries are positive, and ii) P has at least one positive diagonal element [18| . If we do assume the first 
condition, but do not assume the second one, the eigenvalue 1 of P is [algebraically and thus geometrically] non-degenerate, and is not 
smaller than the absolute values of all other eigenvalues |lS(l . The corresponding [unique] eigenvector has strictly positive components. 
However, it may be that the module of some other eigenvalue(s) is equal to 1 thus preventing the proper mixing, but still allowing for 
ergodicity due to condition i). 
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where we used the bra(c)ket notations: |st) is the column vector with elements p s t[k), k — 1, L, and (un| = (1, 1). 

The HMP defined by © is (in general) not a Markov process, i.e., its probabilities do not factorize as in @. Thus 
the history of the process can become relevant. This is the underlying reason for widespread applications of HMP. 

The process X is stationary due to the stationarity of S: 

Pr[X N+l = x N , X l+1 = xi] = Vi[X N = x N , ...,Xi = n] = P(x N , ...,x 1 ), (11) 
where I is a positive integer. 

In addition, X inherits the mixing feature from the underlying Markov process S 0, because the observation 
process by itself is memoryless: it{xk\sk) = ^{xk\sk, Sfc-i, ■■■, s o)- (The general definitions of ergodicity and mixing 
are reminded below.) 



A. Notations for the eigenvalues and singular values. 

For future purposes we concretise some notations. For a matrix A, let Iq [A] , l\ [A] , .... be the modules of its eigen- 
values. We order lk[A] as 

X[A]=l [A] >h[A]>..., (12) 

\[A] is called the spectral radius of A 18]. If A has non-negative matrix elements, the spectral radius is an eigenvalue 
by itself [T^|. Here are two obvious features of the function A (d is a positive integer): 

X[A d ] = (\[A]) d , (13) 
X[AB] = X[BA], (14) 

where ([Fiji follows from the fact that AB and BA have identical eigenvalues: AB\ip) = implies BA (B\tp)) — 
vB\ip). 

Let A' be the complex conjugate of A. The singular values <Tk[A] > for a matrix A are the eigenvalues of 
a hermitean matrix vAAt or, equivalently, of \J A^A; see Appendix [X] for a brief reminder on the features of the 
singular values. We order <Jk[A] as 

M A \ > °i[ A \ > ■■■■ (15) 



III. ENTROPY AND TYPICAL SET OF ERGODIC PROCESSES. 

The iV-block entropy of a stationary [not necessarily Hidden Markov] random process X is defined as 0, [|| Q 

H{N) = H(X 1 ,...,X N ) = - P(x w ...i)lnP(x iV ...i), (16) 

where the probability P(xjv...i) is given as in (jSJ), and where xjv...i is defined in (|10p . Various features of H(N) and 
of several related quantities are discussed in Appendix [Bj 

Using one now defines the entropy (rate) of the random process X as [1, 0, [|[ 

h = ]im N ^ 00 W±. (17) 

Alternative representations of h are recalled in Appendix [B] In particular, h is the uncertainty [per unit of time] of 
the random process given its long history. 

For ergodic processes the above definition of entropy can be related to a single, long sequence of realizations [E,SS]- 
First of all let us recall that the process X is ergodic if it satisfies to the weak law of large numbers (time average is 
equal to the space average): for any function / with a finite expectation value / = J2x k x f\ x k, xo]P(xk, xq), 
we have probability-one convergence for N — > oo 0, @, Q : 



1 

-J2f[X n+k ,...,X n ]^f, (18) 



n=0 
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i.e., for any positive numbers e and 6, there is such an integer Af(s, S) that for all N > Af(e, 5), 



Pr 



N-l 



n=0 



> e 



< S. 



(19) 



Several alternative definitions of ergodicity are discussed in [35l| 3 . 

Now the McMillan lemma states that for an ergodic process the entropy (fTT|) characterizes individual realizations 
in the sense of probability-one convergence for TV — ► oo [5|, @, Q 4 : 



N 



lnP(xAr. 



/i or Pr 



-— lnP(xiv...i) - h 



< e 



>1-S. 



(20) 



Based on ([20]) one defines the typical set Q* N (s) as the set of all xjv...i, which satisfy to 



h-e< InPfxjv i)<h + e. 



(21) 



Now (I2U1) implies that Pr[xjv...i G f2^(e)] > 1 — S, i.e., the overall probability of fl* N {s) converges to one in the limit 
N — > oo. Since all elements in £l* N (e) have approximately equal probabilities^the number of elements |f2^(e)| in 
fijv(e) scales as e Nh . More precisely, this number is estimated from (l20l l2Tj) as Q 

(l-S)e N ^ < \n* N (e)\ < e N( - h+e \ 
Relations similar to (j2Tj) will be frequently written as 



P(*N...l) 



-Nh 



for xjv...i G Q 



(22) 



(23) 



meaning that the precise sense of the asymptotic relation ~ for N — > oo can be clarified upon introducing proper e 
and 5. 



IV. LYAPUNOV EXPONENTS AND ENTROPY. 



The purpose of this section is to establish relation (|2"9"1) between the entropy of a Hidden Markov Process, and the 
spectral radius of the associated random matrix product J8|). The reader may skip this section, if this relation is taken 
granted. 



A. Singular values of the random-matrix product. 

The actual calculation of the entropy h for non-Markov processes meets (in general) considerable difficulties. (For 
Markov processes definition (|17|) applies directly leading to the well-known formula for the entropy Q.) The first step 
in calculating the entropy h for a Hidden Markov Process (HMP) is to relate h to the large- N behaviour of the L x L 
matrix T(xjv...i), which defines the probability of HMP; see ([8j[9]). Recall that T(xjv...i) is a function of the random 
process X. Assume that i) X is stationary, as is the case after (ITT|) . ii) The average logarithm of the maximal singular 
value of T(x) is finite: (ln<7o[T(:c)]) < oo. Hi) X is ergodic. Then the subadditive ergodic theorem applies claiming 
for N — > oo the probability-one convergence [lj| [2(| : 

- — ln<7 it |T(x Jv ...i)] -» fik, k = 0,...,L-l 1 (24) 



3 One such definition is worth mentioning: X is ergodic if for any k, m and s: limjv_,oo-^ P?[Xn+k = x k> ■ ••,<^n = x 0,^m+s = 
y m , X s = yo ] = P{ x k> ■■■^ x o)P{Vm, ■■■,yo). This definition admits a straightforward and important generalization. X is called mixing 
if the above relation holds without the time-averaging i ^2n=o> but m * ne li m it n ~ * °°- 

4 The McMillan lemma contains two essential steps []]. First is to realize that although the definition 1181 of ergodicity does not 
apply directly to Jj. In P(xjv...i), it does apply to the probability Q m (xjv...i) = P(xi, aim) il~£i m P( x m + zl^m+i — 1, x i)> which 
defines an approximation of the original ergodic process by a m-order Markov process. In the second step using a chain of inequalities 
Pr[| In x\ > Tie] < — |lnx| < -i(2x-lnx), one proves that for any stationary [not necessarily ergodic] process Qm(xjv,..i) is indeed a 

good approximation in the sense of In ^7^^ " ' y — for N ^> m — > 00. 
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where <7fe[T(x/^...i)] are the singular values of T(x.n...i) (see section Hi Al for notations), and where are called 
Lyapunov exponents. According to (|15p they are ordered as [Iq < (Mx < .... 

Using the definition (|21| of the typical set, (|24|) can be written as an asymptotic relation <7fc[T(xjv...i)] — e ^ k for 
Xjv...i S and sufficiently large TV [2l[. Moreover, employing the singular value decomposition [see Appendix |A"| . 
one represents T(xjv...i) for TV — > oo and Xjv...i € Ojv as 

T(x iV ... 1 )~diag[e- iV ' i0 ,...,e- iV ^- 1 ] C/(x), (25) 

where diag [a, . . . , b] is a diagonal matrix with entries a, . . . ,b, and where U(x) is an orthogonal matrix. The fact that 
(for TV — > oo) the matrix U does not depend on TV (but does in general depend on the realization x) is a consequence 
of the Oseledec theorem [U [22| . 

Thus the meaning of (j2"5")l is that the essential dependence of T(xjy...i) on TV is contained in the singular values 
e ~ N Hk ) w hile t/(x) does not depend on TV for TV — » oo. 



B. Eigenvalues of the random- matrix product. 

The above reasoning by itself is silent about the eigenvalues of T(x^r x ). Since the matrix T(x./v...i) is in general 
not normal, i.e., the commutator of T(xjv...i) with its transpose T^(xjv...i) is not zero, the modules ife[T(xj\r...i)] of 
its eigenvalues are not automatically equal to its singular values e _ArMfc ; see Appendix [XJ For us the knowledge of 
the spectral radius A[T(x^r x )] will be important, because for calculating the entropy we shall employ a method that 
essentially relies on the features (fT51 [LT| , which hold for the eigenvalues, but do not hold for singular values. 

It is shown in Appendix [D] that the representation (|25p can be used for deducing that in the limit TV — > oo and for 
Xjv...i € fijV the spectral radius A[T(xjv...i)] of T(xjv...i) behaves as [recall (|12j) ] 

X[T( XN ... 1 )]^e' N ^, (26) 

where no is the so called top Lyapunov exponent. Appendix [Dl discusses under which generic conditions (|26[) holds; 
see also [23[ in this context. 

Using ([5]) we have asymptotically for TV — * oo and Xjy...i G Q*n 

T( Xjv ...i) ~ e- N ^\R(x))(L{ X ) | + 0[e- N ^ -i\ (27) 
P(x w .„i) - e - N ^>+°^ + ©[e-^^-O], (28) 

where we denoted ii[T( x JV...i)] = e — jV " 1 ^ XAr — 1 ^ [see (JUJ)], and where | i?(x) ) and |L(x)) are, respectively, the right 
and left eigenvectors of T(xjv...i); see Appendix [X] They do not depend on TV (for TV — > oo) for the same reason 
as U in (f2"5]) docs not depend on TV. In writing down (|2"?| we assumed that the spectral radius A[T(xjv...i)] is not a 
degenerate eigenvalue of T(xjy...i), or at least that its algebraic and geometric degeneracies coincide (see Appendix 
fA"]) . In that latter case one can then use (|2"T|) with straightforward modifications and obtain (f2"8")l . 

The term 0[e~ Nvi( - XN -^] in ((271 [28]) can be neglected for TV -> oo provided that fi > ^i(xjv...i € fi^). The 
multiplicative correction O(l) in P5j) comes from the eigenvectors in (|2T|) . This correction can be neglected if /io 
stays finite for TV •— > oo. Below we assume that these two hypotheses hold. This implies from (I2T]) a straightforward 
relation between the entropy h and the spectral radius A[T(xjv...i)] of T(xjv...i): 

h = /i = Umjv^oo{ -jj In A[T( Xjv ...i)] }. (29) 

The relation between the top Lyapunov exponent and the entropy is known (ill . [l| • The above discussion empha- 
sizes the role of the spectral radius in this relation (2?| • 



V. GENERATING FUNCTION AND ATYPICAL REALIZATIONS 



While the entropy characterizes typical realizations of the process, it is of interest (mainly for a finite number of 
realizations) to describe atypical realizations, those which fall out of the typical set £l* N . 
To this end let us introduce the generating function Q 

A JV (n,TV)= Y, Xn mxN...i)] , (30) 
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where n is a non-negative number. (Note that A N (n, N) means A(n, N) in degree of N.) 

The generating function A N (n, N) is an analog of the partition sum in statistical physics Q 5 . Writing 

A N (n,N) = Yl A™ [T(xjv...i)] + ]T A™ [T(xjv...i)] , (31) 



one notes that in the limits N — > oo and n — * 1 the second contribution in the RHS of ([311) can be neglected due to 
definition (|5T | 125 ]) of the typicality, and then A N (n, N) = A N (n) = e -{n-i)Nh. see $ZT\ EH]). Here we already noted 
that A(n, AT) does not depend on N for N — > oo, and denoted (in this limit) A(n, N) = A(n). 

Taking into account that A(l) = 1, the entropy h is calculated via derivative of the generating function: 

1 M"(n) _ dA(n) _ _ A 



d?? 



= - XI A[T(x Jv ...i)]lnA[T(x Jv ...i)]. (33) 

The generating function (|30l) can be employed for estimating the weight of atypical sequences. This estimate is 
known as the Chernoff bound 0, [1| , and now we briefly recall its derivation adopted to our situation. 

Consider the overall weight of atypical sequences, which have probability lower than the typical-sequence probability 
e -JVh. see (|2T1 [23]) . These atypical sequences are defined to satisfy 

-lnA[T( Xjv ...i)] > (l + tj)Nh, (34) 

where rj > quantifies the deviation from the typical behavior. Let ^2 XN 1 be the sum over all those xjv...i that satisfy 

to (f3"4")) . Define an auxiliary probability distribution P(xjv...i|n) = A~ N (n, N) A™ [T(xjv...i)]- The sought weight of 
the atypical sequences is expressed as (77 > and < n < 1): 



V A [T(xjv...i)] = A N (n,N)J2 P(*n. ..i\n) 



-n) lnA[T(xjv...i)] 



< e N[lnA(n,N) + (n-l)(l+r,)h] p/^ 1 1 n ) < g N[ In A(n,JV)+(n- 1) ] _ (35) 

Eq. leads to the following upper (Chernoff) bound for the weight of atypical sequences with the probability lower 
than the e~ Nh : 

J2 X[T( XN ... 1 )]<e- Nf ^\ (36) 

-lnA[T(x Jv ...i)]>(l+»?)JV/i 

/(7 ? ) = max <n<i[hi TrT + (l-n)(l + ?7)/j], rj > 0. (37) 
A(n) 

Analogously to (|33|l we get for the weight of the atypical sequences with the probability higher than the e~ Nh 
(0 < 77 < 1): 

J2 Apr(x Jv ... 1 )]<e- JV »("), (38) 

-lnA[T(x Jv ...i)]<(l-J))A'h 

= mia B >i [h-rpr + (1 - n)(l - v > 0. (39) 

A(ro) 

The functions f{rj) and 3(77) in (|37[) and (j3"9")l , respectively, are called the rate functions [6]. It is seen that f(rj) and 
g{rj) are the Legendre transforms of In A(n). The latter is a convex function of n, -4— lnA(n) > 0, as follow from its 
definition (|30|) . Then /(ry) and 3(77) are convex as well [8J. For example taking into account that n and 77 are related 

via the extremum condition ^ lnA(n) = —(1 + rj)h, we get f"(ji) = ( 4 s ) -r-j lnA(n) > 0. 

V ' / L J n—n{rj) 

While the above reasoning is based on the Chernoff bounds, there is another (related, but more formal) approach 
to describing atypical realization, which is known as the measure concentration theory. For a recent application of 
this theory to HMP see [H- 



5 A(n, N) is sometimes called the generalized Lyapunov exponent. It is closely related to the concept of multi-fractality [2ll . 
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VI. ZETA FUNCTION AND ITS EXPANSION OVER THE PERIODIC ORBITS (CYCLES). 

A. Zeta function and entropy. 



In this section we show how to adopt the method proposed in [25], |27j for calculating the moment-generating function 
A(n) (and thus for calculating the entropy h via (J22|))- The method is based on the concepts of the zeta-function and 
periodic orbits. 

Define the inverse zeta-function as 18, 25, 26, 28] 



£(z, n) = exp 



V" — A m (n,m) 



m— 1 



(40) 



where A m (n,m) > is given by ([50)) . The analogs of (|4"0"[) are well-known in the theory of dynamic systems; see [26[ 
for a mathematical introduction, and [25|, [53, HU for a physicist-oriented discussion. 

Since for a large N, A N (n, N) — > A N (n), the zeta-function £(z, n) has a zero at z = j^j'- 

^r )=0 - (41) 

Indeed for z close (but smaller than) j^y, the series J2m=i l^~A m (n, m) — * El=i a l mos t diverges and one 

has £(z) — > 1 — zA(n). 

Recalling that A(l) = 1 and taking n — » 1 in 

d 1 s A '(n) ^ 1 s 3^1 
dn A(ra) ; A 2 (n) dz y A(n) ' dn A(n) 

we get for the entropy from (I32|l 

ft=-A'(l) = - *»^ ' J . (43) 

B. Expansion over the periodic orbits. 

In Appendix IE 21 we describe following to [25|, [26|, [2?], that under conditions (fl~3l [T4|) one can expand £(z, n) over 
the periodic orbits: 

oo 

£(*,") = II II [l-^A"[T^ 7l )...T(^ p )]], (44) 
p=i r p ePor( P ) 

T p = (71, ...,7 P ), (45) 

where 7$ — 1, M are the indices referring to the realizations of the random process X . The set of periodic orbits 
Per(p) contains sequences T p = (71,..., j p ) selected according to the following two rules: i) T p turns to itself after 
p successive cyclic permutations of its elements, but it does not turn to itself after any smaller (than p) number of 
successive cyclic permutations; ii) if T p is in Per(p), then Per(p) contains none of those p—1 sequences obtained from 
T p under p—1 successive cyclic permutations. Concrete examples of Per(p) for M = 2, 3 are given in Tables ITVl and 

B 

It is more convenient to present (|4"4"|) as an infinite sum (2f| [27], [2^ ] 

M 00 

^z,n) = l-zY / h+J2^(n)z k , (46) 

1 = 1 k=2 

where we defined 

l a ...p = l[T{x a )...T{x p )l l a+ p = l[T(x a )]l[T{xp)], (47) 

and where tpk(n) are calculated from (|44ll4"5"|) and recipes presented in Appendix [EJ These calculations become tedious 
for large values of k in (pk(n). This is why in Appendix IE 31 it is shown how to generate </?fc(n) via Mathematica 5. 



8 



For two (M = 2) realizations of the HMP we employ the notations (|47|) and get for the first few terms of the 
product (f4"4"| [consult Table IIVI for understanding the origin of these terms] 

£(z,n) = (1 — zA") (1 — zA 2 ) (1 — zA" 2 ) (1 — zA™ 22 ) (1 — zA™ 12 ) (48) 

OO 

(l-*A? Ma )(l-*A? na )(l-*A? la2 )n II i 1 -^-^)- ( 49 ) 

p=5 r p ePcr( P ) 

For the first six terms of the expansion (|4"6")l we get 

<P2(n) = -I12 + h+2, (50) 

¥>3(n) = -'221 + h+21 - I112 + 'l+12, (51) 

(f4(n) — —l\\22 + ^2+211 — ^1222 + '2+122 — 'lll2 + 'l+211 (52) 

— '1+2+12 + '1+122 (53) 

= —'11222 + '1+1222 — '11122 + '2+1112 (54) 

—'11112 + '1+1112 — '12222 + '2+1222 (55) 

—'12121 + '1+1122 — '12122 + '2+1122 (56) 

—'1+2+122 + '12+122 — '1+2+112 + '12+112, (57) 

= —'111122 + '1+11122 — '112122 + '1+12122 — '111222 + '1+11222 (58) 

—'111212 + '1+11212 — '112222 + '1+12222 — '222121 + '2+22121 (59) 

—'122222 + '2+12222 — '111112 + '1+11112 — '112212 + '2+12121 (60) 

—'1+12+122 + '1+12122 — '2+12+211 + '12+1122 — '1+12+211 + '12+2111 (61) 

—'2+12+122 + '12+1222 — '1+2+1222 + '2+11222 — '1+2+2111 + '2+21111 (62) 

—'1+2+1122 + '122+211- (63) 

In section fVII El we study examples, where the expansion (|46[) can be summed exactly. In these examples the sum 
in (|4"6"|) exponentially convergences for \z\ < a n , where a > 1 is a parameter. As discussed in (28|, the exponential 
convergence of £(z) is expected to be a general feature, and it is supported by rigorous results on the structure of the 
zeta-function. 

1. The structure of ipk(n). 

Note that (pk consists of even number of terms. The terms are grouped in pairs, e.g., [—'221 + '2+21] + [—'112 + '1+12] 
for 993, and analogously for other (pk's. Each pair has the form — I a + 'b, where A and B have the same number of 
symbols 1 and the same number of symbols 2. This feature ensures that when the spectral radius of the product is 
equal to the product of the spectral radii, all the terms ipk will vanish. Ultimately, this is the feature that enforces 
the convergence of (f46|) 25, 28]. Once it converges, we can approximate £(z, n) by a polynomial of a finite order. 

The set of pairs for each ifk can be divided further into several groups. The first group is formed by (|50[) and (|51|) 
for ip2 and (pa, respectively, by (|52[) for (pi, by (|54ff5"6"]) for cp$, and by (p)5H6"0"|) for (pg. The pairs in this group have the 
form —Iai + Ia+U where ' = 1 or I — 2. If A contains m indices and if m is large, we expect ln^ = 0(m) according 
to the discussion in section HVB1 Then 

- Iai + Ia+i ^0 for m — * 00. (64) 

The second group is given by (|53"|) for ip 4 , ([57| for 935, and by (|6"T1 16^|) for <p 6 . In this second group the terms have 
the form — Ia+b+c + La+bc = Ia(Ib+c — Ibc)- Here the term ('b+c — Ibc) has the structure of the first group. For 
B or/and C containing a large number of indices, (I b+c — Ibc) will go to zero. 

Finally the third group appears only for k > 6. For k — 6 this group has only one pair given by (|63p . The members 
of this third group are of the form — Ia+b+c d + Iabd+c- 
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Let us return to (|64|) . which holds, in particular, for A consisting of the same type of indices (e.g., A containing 
only l's). Recalling our discussions after (j2"8|) and after (j6"3")l , and expanding A over its eigenvalues and eigenvectors, 
we conclude heuristically that for the convergence radius of X)fc°=2 fk{'n)z k in (|46|) to be sufficiently larger than 1, it 
is necessary to have for the transfer- matrices T(x) (using notations (jT2J) ) 

\[T(x)]tfh[T(x)], X[T(x)]^l, (65) 

i.e., closer is A[T(x)] to Zi[T(a;)] and or X[T(x)] to 1, more terms are needed in the expansion (|46|) for the reliable 
estimate of the entropy. Note that if A[T(x)] = ?i[T(x)] > l 2 [T(x)\, the first relation in (f6"5)) should be modified to 
A[T(x)] 76 l 2 [T(x)}. We shall meet such examples below; see (|5Tj) and the discussion before it. 

Recall from (|43| that for calculating the entropy we need to know £(z, n) in the vicinity of z = 1 and n = 1. If the 
qualitative conditions (|65p are satisfied, we expect that the vicinity of z = 1 and n = 1 is included in the convergence 
area. The convergence of expansions similar to (|16")) is discussed in [2^, [53, HH . In particular, Refs. (2f| [27j employ 
criteria similar to (|65[) and test them numerically. 

In the context of expansion (|46[) we should mention the results devoted to analyticity properties of the top Lyapunov 
exponent [3(| [3l[ and of the entropy for HMP [32||. In particular, Ref. [HJ states that the entropy h of HMP is an 
analytic function of the Markov transition probabilities (j3]), provided that these probabilities are positive. At the 
moment it is unclear for the present author how in general this analyticity result can be linked to the expansion 
(|46p . However, we show below on concrete examples that the expansion (|46[) can be recast into an expansion over the 
Markov transition probabilities ^ . 



VII. THE SIMPLEST AGGREGATED MARKOV PROCESS. 



A. Definition. 



An Aggregated Markov Process (sometimes called a Markov source) is a particular case of HMP, where the proba- 
bilities ir(x\s) in {5| take only two values and 1 Thus it is defined by the underlying Markov process S together 
with a deterministic function F(si) that takes the realizations of the Markov process to those of the aggregated pro- 
cess: X = (Xi, X 2 , ...) = (F(Si), F(S 2 ), ...). The function F is not one-to-one so that at least two realizations of S 
are lumped together into one realization of X. 

The simplest example is given by a Markov process S = {Sq,S\, ....} with three realizations Si — 1, 2, 3, such that, 
e.g., the realizations 2 and 3 of Si are not distinguished from each other and correspond to one realization 2 of the 
observed process Xi [see Fig. [T] : 

F(l) = l, F(2) = F(3)=2, (66) 
7r(l|l) = l, tt(1|2) = 0, tt(1|3) = 0, (67) 
tt(2|I) = 0, tt(2|2) = 1, tt(2|3) = 1. (68) 

The transition matrix of a general three-realization Markov process is [see Fig. [1] 

I-P1-P2 91 n \ I <zi(ri + r 2 ) + q%rx \ 

Pi 1-91-92 r 2 , I st) oc r 2 {pi +p 2 ) + Pin (69) 

P2 92 1 - n - r 2 J \ P2(qi + 92) +P192 / 

where all elements of P are positive, and where we presented the stationary vector |st) up to the overall normalization 

6 

The process X — {X ll X 2: ■■■■} has two realizations: Xi = 1, 2. The corresponding transfer matrices read from ([7]) 

/ 1 - pi - P2 91 n \ ( \ 

T(l) = 0, T(2) = \ pi 1-91-92 r 2 . (70) 

V / \p 2 q 2 l-n-r 2 ) 

Note that the second (sub-dominant) eigenvalue of the transfer-matrix product T(xjv...i) = ElfeLi T(xk) (with separate 
transfer-matrices defined by (|70|) ) is equal to zero, since this eigenvalue can be presented as that of the matrix T(1)A, 



Note that some authors present the Markov transition matrices P is such a way that the elements in each raw sum to one. This amounts 
to transposition of II69I I. The representation J69D is perhaps more familiar to physicists. 
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FIG. 1: Schematic representation of the hidden Markov process defined by (|66H70|I . The gray squares and gray arrows indicate, 
respectively, on the realization of the internal Markov process and transitions between the realizations; see (|69[l . The circles 
and black arrows indicate on the realizations of the observed process. The gray arrows are probabilistic; the corresponding 
probabilities are indicated next to them. The black arrow are deterministic; see (|66[) . 



where A is some 3x3 matrix. The only exclusion, which has a non-zero sub-dominant eigenvalue, is the realization 
of X that does not contain 1 at all: T(2...2) = T N (2). 

The considered HMP ([551170)) belongs to the class of HMP with unambiguous symbol, since the Markov realization 
1 is not corrupted by the noise; see Fig. Q] For such HMP, Ref. [13] reports several results on the analytic features of 
the entropy. 

B. Unifilar process. 

Before studying in detail the HMP defined by (|66rf70| . let us mention one example of HMP, where the entropy 
can be calculated directly 0, [j| . This unifilar process is defined as follows [f| : for each realization Si of the Markov 
process S consider realizations Sj with a strictly positive transition probability p(sj\si) > 0. Now require that the 
realizations F(sj) of Xj are distinct. Thus given the realization Sj of <Si, there is one to one correspondence between 
the realizations of (X\,X%, ...) and those of (<Si,<S2, ...). Write the block-entropy of X as 

H(X N , ...,X 1 ) = H(X N , A-rlSx) + - H(Si\X u ...,X N ), (71) 

where H(A\B) = — b P r ( a i b) lnPr(a|6) is the conditional entropy of the stochastic variable A given B. Due to 
the definition of the unifilar process: H{Xm 1 ...,X\\S\) = H(Sn 7 ...,«Sa|«Si). The latter is worked out via the Markov 
feature: 

H(S N , ...,S 2 \Sx) = (N- l)/w rkov , (72) 

Varkov = -yip a t(k)p(l\k) lnp(/|fc), (73) 
k,l 

where p s t(k) is the stationary Markov probability defined in ([4]), and where p(l\k) are the Markov transition proba- 
bilities from J3]). Since H(Si) and H(Si\Xi, ...,Xn) in ((7T|) are finite in the limit N — * oo, the entropy h(X) of the 
unifilar process reduces to that of the underlying Markov process /i ma rkov [E] • 

Note that any finite-order Markov process (conventionally assuming that the usual Markov process is of first order) 
can be presented as a unifilar process. There are, however, unifilar processes that do not reduce to any finite-order 
Markov process [j| 7 . The main problem in identifying unifilar processes is that even if X is not unifilar for given S, 
it can be still unifilar with respect to another Markov process S' (see section IVII CI below for the simplest example) . 
This makes especially difficult the recognition of unifilar processes that do not reduce to any finite-order Markov 
process. 



The example of such a process given in [j| is not minimal. The minimal example is given by four-realization Markov process with 
non-zero transition probabilities p(4|l), p(3|4), p(2|3), p(l|2), p(l|l), p(2\2), p(3|3) and p(4|4) (all other transition probabilities are 
zero), and two realizations of Xi such that -F(l) = ^(3) = 1, ^(2) = -F(4) = 2. The unifilar process X does not reduce to a finite-order 
Markov process, since, e.g., there are two different mechanisms of producing the sequence 1...1. This means that P(l|lll) is not equal 
to P(l|ll), etc. 
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C. Particular cases. 



We now return to the HMP (f66H70|) and discuss some of its particular cases. 

1. For q 2 = r 2 aud <?i = r± all the terms (fk with k > 3 in the expansion (|46|) are zero. One can check that for this 
case the observed process X is by itself Markov. 

2. For (1 — q±— (72) (1 — T\ — r 2 ) = <72?"2, one can check that (fit- — for k > 4. Now the process X is the second-order 
Markov: P(x fc |x fe _i, x k ^ 2 , z fc _ 3 ) = P(a; fe |a; fe _ 1 , Xfc_ 2 ). 

Thus at least for these two cases the calculation of the entropy is straightforward. 

The above two facts tend to clarify the meaning of the expansion (|4"6")l . It is tempting to suggest that if the expansion 
([46]) is cut precisely at a positive integer K > 2, i.e., tpk>K = 0, then the corresponding process X is K — 2-order 
Markovian. If true, this will give convenient conditions for deciding on the finite-order Markov feature, and will mean 
that the successive terms in (|46|) are in fact approximations the HMP via finite-order Markov processes. 



D. Upper and lower bounds for the entropy. 

Before presenting the main results of this section, let us recall that the entropy of any (stationary) HMP satisfies 
the following inequalities [|[ 8 : 

H{X 2 \S X ) <h< H{X 2 \X 1 ) = H(2) ~ H(l), (74) 

where H(A\B) — — J2 a b P r (-4 = a,B — 6)lnPr(.4 = a\B = b) and H(N) are, respectively, the conditional entropy 
and the block entropy defined in (fT6|) . Employing ([5l [7]) we deduce 

L 

Pr(X 2 =x\Sx = s) = J2 T S ' S (x)- (75) 

s'=l 

This equation together with the stationary probability (|o^)) of the Markov process is sufficient for calculating H(X 2 \Si) 
for the HMP dH [70]): 

H(X 2 \S 1 )=p st (l)x(pi+P2)+Pst(2) X (qi)+Pst(3)x(ri), (76) 
Xip) = -plnp- (1 -p)ln(l -p). (77) 

The upper bound H(X 2 \Xi) is calculated directly from ((51 fTTTl fTBj) . 



E. Generating function and entropy: exact results. 

For a particular four-parametric class of HMP (|66T[70| we were able to sum exactly the expansion (|46f 9 . This 
class is characterized by the condition that the two leading eigenvalues of the transfer-matrix T(2) in (|70[) have equal 
absolute values [the third eigenvalue is equal to zero] : 

A[T(2)] = A 1 [T(2)]. (78) 

A direct inspection shows that this condition amounts to two possible forms (fBT)]) and ([55)1 of the transition matrix P. 
These two cases are studied below. 

1. First case. 

For this first case the transition matrix is obtained from ([70)) under 10 

r 2 = and r x = qi + q 2 . (79) 



Eq. H74I I is a particular case of a slightly more general inequality [rj llOjl . For our purely illustrative purposes 1741 1 is sufficient. 
This was done by hands, checking the separate terms of the expansion (1441 . 

Or, alternatively, via 52 = and q\ = r\ + ri. This, however, does not amount to anything new as compared to ( 1 80 D - 
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FIG. 2: Entropy J85J of HMP Ipl 1701 180)1 versus q = q 2 for p 2 = gi = 0. Normal line: pi = 0.5. Thick line: pi = 0.75. Upper 
dashed line: pi = 0.05. Lower dashed line: p\ = 0.01. It is seen that for a small value of pi, the entropy h is nearly constant 
for a range of q = q 2 . 



This leads from (1691) to the transition matrix 

1 - pi - p 2 
Pi 

P2 



qi qi + qi 

1 - 9i - 92 

<72 1 - qi — Q2 



(80) 



It is seen that the realization {Sk+i — 2, Sk = 3} for the Markov process is prohibited. For the HMP there are no 
prohibited sequences. 

The inverse zeta- function reads from (1461): 



£(z,n) = l-[(l-pi-p2)" + (l-<Zi-<Z2) n b 

+ [ (1 - - p 2 ) n (l -qi- q 2 ) n ~ (Piqi + p 2 (qi + q 2 ) ) n ]z 2 
+ z 3 [ Pl q 2 { qi + q 2 )] n [ -n, b) - - n , 6 + 1) ] , 



where we defined 



b=(l-qi- q 2 ) 



p 2 (qi +q 2 ) +Piqi 
Piq 2 (qi+q 2 ) 



V=(l-qi- q 2 ) n z, 
and where Q(y, —n, b) is the Lerch ^-function: 



<P(y,-n,b) = Y,(k + b) r 



(82) 
(83) 

(84) 



k=0 



In this representation, which led to (|8ip. the sum converges for \y\ < 1 or for z < (1 — q\ —q 2 )~ n > 1. The convergence 
radius tends to one for qi + q 2 — > 0, or, cquivalently, for A[T(2)] — > 1; see ([7D|) . This violates the second qualitative 
condition in (I65|) . 

Using (|4"3"1) we get from (fSTj) for the entropy: 



h = 



1 



192 { 



Pl+P2 + qi+q 2 + ^ 2 



p\qi 



(1 - pi -P2)(?i + q 2 ) ln(l - pi -p 2 ) + (1 - qi - q 2 ){pi + Pi + 

qi+ q 2 

+Piq 2 \n[p-iq 2 {q\ +q 2 )] + [ (pi + p 2 )qi + p 2 q 2 } ln[ (pi +p 2 )gi + p 2 q 2 ] 
+Piq-2(qi+q 2 ) $' [2] {l-qi-q 2 ,-l,b)-& [2] (l-q 1 -q 2 ,-l,b+l) }, 

where b is defined in (|82p . and where 

fc=0 



■)ln(l-gi -g 2 ) 



${ 2] (y,-l,6) = ^ln 



(k + b)y k . 



(85) 



(86) 
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TABLE I: For two set of parameters of the HMP (|66l [791 180] 



lower bound H(X2\Si), and the upper bound H(X2 



Xi); 



we present the exact value of entropy h obtained from (|85[) 
23. 



the 





h 


H(X 2 \Si) 


H(X 2 \Xi) 


pi = 0.75 
p 2 = 0.10 
qi = 0.25 
92 = 0.20 


0.569580 


0.557243 


0.572373 


pi = 0.30 
p 2 = 0.20 
qi = 0.55 
g 2 = 0.10 


0.684796 


0.682486 


0.684843 




0.001 
.0005 



FIG. 3: The rate functions /(??) and g(rj) defined by (|37]> and (J39jl, respectively for the HMP given by ((70l [88] |89j . Normal 
line : g{rf). Dashed line : f{rf). For the parameters in (I88p we take: pi = 0.2, p2 = 0.3, q = 0.05, and r — 0.01. For these values 
the entropy J90]) is h = 0.166671. 



The behavior of ft. is illustrated in Fig. [5] for particular values of p\ , P2 , Qi and 52 ■ Table [J compares the exact 
expression (1551) wl th the upper and lower bounds (|74")) . 

The analytic features of h given by (|85p as a function of the Markov transition probabilities pi, P2, qi and (72, agree 
with the results obtained in [32j |. In particular, note that for p\ +P2 — * 1 the entropy /i becomes non-analytic due to 
the term oc (1 — p\ — P2) ln(l — pi — pz). 




FIG. 4: The same as in Fig. [3] but with q = 0.1 and r = 0.4. For these values the entropy (|90[) is /i = 0.619519, which is larger 
than the entropy in Fig. [3] 
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TABLE II: For two set of parameters of the HMP (|69l 1701 1880 we present the exact value of entropy h obtained from (|90|l , the 
lower bound H(X2\Si), and the upper bound H(X2\Xi); see (|74jl. The parameters pi, P2, q and r are tuned such that H(X2\Si) 
and H(X2\X\) provide rather tight bounds on h. 





h 


H(X 2 \Si) 


H(Xa\Xi) 


pi = 0.1 

P2 = 0.1 

q = 0.2 
r = 0.3 


0.528531 


0.525571 


0.528534 


Pi = 0.2 
P2 = 0.2 
q = 0.3 
r = 0.4 


0.659897 


0.656974 


0.659901 



2. Second case. 



The second possibility of satisfying (|78[) is given by 

qi + q_i = 1 and ri + r 2 = 1, 



(87) 



1 - Pi - P2 9 i" 
Pi 1 — r 

p 2 1-9 

The realizations of the corresponding Markov process do not contain {Sk+i = 2, Sk — 2} and {Sfc+i = 3, Sk = 3}. 
Again, the realizations of the HMP do not have any prohibited sequence. 
The inverse zeta-function reads from (1461) 



f(z,n) 



1 



(l- Pl -p 2 ) n + (l-qT^(l-r) 



sn/2 



-(piq+P2r) n + (1-Pi- p 2 )"(l - q) n/2 {l - r) n ' 2 



l + z(l-g)"/ 2 (l-r)"/2 



(Piq + P2T) n {l - q) n/2 (l ~ r) n ' 2 - ( Pl r(l - q) + p 2 q(l - r) )" 



(89) 



The series that led to (I8T)1) converges for \z\ < (1 — q)~ n / 2 {\ — r)~ n / 2 . Again the convergence radius going to one 
violates the second qualitative condition in (|65[) . 
Eqs. (|32l |89|) imply for the source entropy: 



{ - r) +r](l -pi -p 2 )ln(l-pi -p 2 ) 



2(pi +P2) + g(l - pi) + r(l -p 2 ) - qr 

+(Pi +P2) (1 - ?) (1 - ln[(l - - r)] + ( Pl q + p 2 r) \n\pxq + p 2 r] 

+ [p 2 q(l-r) +pi(l-g)r]ln[pag(l-r)+pi(l-g)r]}. (90) 



Applying the general definition ([73|l of the Markov entropy to the particular case (|69|) wc get for the Markov entropy 

1 



*Tnarkov 



{ 



2(pi + P2) + g(l - Pi) + r(l - P2) - 
[g(l - r) + r][(l -pi -p 2 )ln(l -pi -P2) +Pi hipi +p 2 lnp 2 ^ 
[(1 - r)(pi + p 2 ) + pir] [glng + (1 - g) ln(l - g)] 
[P2 + Pi(l - ?)] [r lnr + (1 - r) ln(l - r)] }. 



(91) 



Comparing (|90l 19 1 1) one can check [e.g., numerically] that ftmarkov > h, as should be, since lumping several states 
together decreases the entropy. Table [TT| compares the exact value (|9"0|) for the entropy with the upper and lower 
bounds (HU). 
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3. Rate functions for large deviations. 

Recall that the rate function f(r)) (3(77)) defined in section [V] describe the weight of atypical sequences with the 
probability smaller (larger) than the typical sequence probability e~ . The positive parameter 77 defines the amount 
of this smallness (largeness); see (1571) and 

The calculation of f(rj) and 17(77) for the considered HMP model ((881 [70f is straightforward. One finds out the zero 
of the ^-function given by (f8"9"|) . This will define, via (|41[) . the moment-generating function A(n). If there are several 
zeros of £(z, n) as a function of z, we select the one that goes to z = 1 for n — > 1. Then f(rj) and g(rj) are calculated 
from their definitions (j3"T[) and (|39[) . 

The behavior of f(rj) and (7(77) as functions of 77 is presented in Figs. [3] and [4] For each figure we take different sets 
of parameters pi, P2, q and r; see ([88} for their definition. To make this difference explicit let us denote f^rf), 33(77) 
and fi{rj), 34(17) for Fig. [3] and Fig. [H respectively. 

Now let us observe that 

fM < fM, 9M < 9i(v), (92) 



Safa) > fM, 94(r7) < / 4 (i7). (93) 

For explaining these inequalities we note that for the parameters of Fig. [3] the entropy is smaller than h in Fig.[4j 

h 3 < h 4 , (94) 

which means that the typical set fl* N for Fig. [¥] contains more sequences, so there remains less of them outside, which 
may explain (j!?2"|) . For the same reason (j9~4"|) , the probability of each typical sequence is higher for the parameters in 
Fig. [3] Thus for the parameters presented in Fig. [3] more high- probability sequences are included in the corresponding 
typical set Q* N . This may explain (|93|) . 

In further numerical checkings it was noted that the above relation between (192|) and |93|) from one side, and |94|) 
from another side, seems to be much more general than these particular examples. 

VIII. BINARY SYMMETRIC HIDDEN MARKOV PROCESS. 

A. Definition and symmetries. 

This is another popular (and simple to define) example of HMP. Now the Markov process has two states 1 and 2. 
The realizations of the observed (Hidden Markov) process also take two values 1 and 2. The internal Markov process 
is driven by the conditional probability 



(95) 



p(l|l) p(l\2) \ 1-q 

p(2\l) p(2\2) ) V q 1-q 

The stationary probability for this Markov process is found via (@|: p s t(l) = p s t(2) = \. 
The probabilities for the observations 1 or 2 given the internal state read 

tt(1|1) tt(1|2) 

*(xi\si) =| 1 = 1 I , (96) 

tt(2|1) tt(2|2) J V e 1-e 

where e is the error probability during the observation. 
For the transfer matrices we have: 

e(l-q) eq \ ( (1 - e)(l - q) 

T(IJ=| , T(2)= |. (!)7) 

(l-e)q (l- e )(l-q) / V eq 




T(2) is obtained from T(l) via e -> 1 - e. 
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TABLE III: For two sets of the parameters q and e of the binary symmetric HMP (|95U96|[97)l we present the entropy h obtained 
by approximating (146p via a polynomial or order 2, 13 and 12, respectively. These values are denoted by h 2 , his and /112. We 
compare ht with the lower bound H(X2\Si), and the upper bound H (X2\Xi); see (|T4[) . It is seen that the relative difference 
hl g~ / ' 2 is not larger than 0.02. 





ft 2 


his 


hi2 


H(X 2 \S 1 ) 


H(X 2 \X 1 ) 


g = 0.2 
e = 0.45 


0.687811 


0.693108 


0.693100 


0.691346 


0.693129 


g = 0.25 
e = 0.4 


0.681322 


0.692884 


0.692881 


0.688139 


0.692947 



The following symmetry features are deduced directly from 

(1) For any N the probability P(xn, . . . , x\; q, e) of the binary symmetric HMP is invariant with respect to e — > 1 — e: 
P{x N , . . . ,xi;q, e) = P(x Nl . . . , x x ; q, 1 - e). 

(2) The probability P(xn, ■ ■ ■ , Xi\ q, e) is invariant with respect to the full "inversion" of the realization (xn, • ■ ■ , x\), 
e.g. P(l,2,l,l; ? ,e) = P(2,l,2,2;g,e). 

(3) In general, the probability P(xn, ■ ■ ■ ,xi;q,e) is not invariant with respect to q — > 1 — g, e.g., P(l,2;g, e) — 
P(l, 2; 1 - q, e) = |(1 - 2e)(2q - 1). However, for each given realization (xjv , ■ ■ • , x\ ) one can find another unique 
realization (xn, . . . , x\) such that P(xn, . . . ,x\; q, e) — P(xn, ■ ■ ■ ,Xi;1 — q,e). The logics of relating (xn, . . . , x\) to 
(xjv, ■ ■ ■ , x\) should be clear from the following example: if (x^, . . . , x\) = (1,2,2,1), then (X4, . . . ,x±) = (2,2,1,1). 
In more detail, X4 — 2 is defined to be different from X4 — 1, and once X3 = 2 is different from X4 = 1, x% = 2 does 
not differ from = 2, etc. It should be clear (e.g., by induction) that for a given (xn, ■ ■ ■ , x%), (x/v, . . . , x\) is indeed 
unique. 

This feature means, in particular, that the entropy h of the binary symmetric HMP — being according to (fTBI [l"7|) 
a symmetric function of all probabilities P(xn, ■ ■ ■ , xi) — is invariant with respect to q — > 1 — q: h(q, e) = h(l — q, e), 
in addition to being invariant with respect to e — > 1 — e. 

(4) In general, the probabilities P(xn, . . . , x±) are not invariant with respect to a cyclic interchange of the realiza- 
tions, e.g., P(l, 2, 1; q, e) - P(l, 1, 2; q, ,e) = 1(1 - 2e) 2 g(2g - 1). 

For the considered binary symmetric HMP we did not find any exactly solvable situation. Thus, we employed (|46p 
and calculated n) by approximating the infinite sum in the RHS of (|46[) via a polynom of order ii": X)s^=2 ( Pk{n)z k 
11 . This approximation was suggested in (25j and it is based on the fact that the sum supposed to converge exponen- 
tially at least in the vicinity of z = 1 and n = 1. This is what we saw for the exactly solvable situations (|8ip and (I8D|) . 
The qualitative criterion for the exponential converges was suggested in [25l [27| and was discussed by us around (I65|) . 
Since both transfer-matrices in ((97)) have the same eigenvalues 

\ [l-g± v /g 2 + (l-2< Z )(l-2 e )2] , (98) 

for the studied binary symmetric HMP there are several cases, where the [qualitative] conditions (|65|) are violated: i) 
q — * and e — > ^; ijj 9 — * 1; iiij g — * and e — > 0. In these three cases we expect that that approximating £(z, n) 
by ^2^=2 fk(n)z k will not be feasible, since large values of K will be required to achieve a reasonably high precision. 
Fig. [5] and Table IIIII present the results for the entropy obtained in the above approximate way and compare them 
with the upper and lower bounds, as given by (|74| . 



B. Small-noise limit. 



For e = I or for q = | the process becomes memory-less: P(x%, ...,Xn) — P(xx)...P(xn). Here all the functions 
ipk in (|46p are equal to zero. Another particular case is the limit e — > (no noise), where the hidden Markov process 
degenerates into the original Markov process. It is straightforward to check that in (|46l) for the entropy only the term 
(j>2 is different from zero, while <pk — for k > 3. This produces the well-known expression (|73p for the entropy of a 
Markov process. 



The terms in this expansion can perhaps be re-arranged so as to facilitate the convergence. Since in the present paper the numerical 
calclations serve mainly illustrative purposes, we shall not dwell into this aspect. 
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FIG. 5: Entropy of the binary hidden Markov chain (normal line) versus the error probability e for q — 0.1. Dashed lines: 
upper and lower bounds for the entropy as given by (|74|l . The entropy is calculated from (|46l 143 [I approximating the infinite 
sum in (146 1 by a poynomial or the order 13. 



Let us work out the vicinity of e = 0, assuming that e is small (quasi-Markov situation). One can check that 

ip k = 0(e k - 2 ) for k > 3. (99) 

Thus for finding the entropy and the generating function within the order C(e 2 ), we need to expand ipk with k = 
1, 2, 3, 4 over e and select all the terms of order 0(e) and 0(e 2 ). We write down explicitly the approximation of n) 
via the polynom of order 4 (higher-order terms (fik>5 are not needed, since they do not contribute to the order 0(e 2 )): 



Using |98j) and 



£{z, n) = l + zip 1 {n) + z 2 <p 2 (n) + z 3 p 3 (n) + z 4 <p 4 (n) + 0(z 3 ). (100) 

we get after straightforward algebraic calculations (taking for simplicity q < i) 

pi(n) = -2(1- q) n 

+ 2en(l-q) n - 2 (1 - 2 q) 



eMl-g)"- 4 (l-2q) {(l-2q)(n-l-q)+q} + 0(e 3 ), 



(101) 



<p 2 (n) = (l-q) 2n -q 2n 

- 2en(l-2q) [(1 - qf ^~ X) + q 2 ^ 

- e 2 n(\-2q) [q 2 { (1 - 2q)(q + 2n - 3) - q } 

+ {1 _ q) 2(n-2) {{1 _ 2q){q+1 _ 2n) _ q} 



0{e% 



(102) 



<p 3 (n) = 2en(l-2q) 2 (l-q) n - 2 q 2 ^ 

- e 2 n(l-2q) 2 {l-q) n - i q 2( > n -V [5 - 3n + 4q{3n - 5) + 2q 2 (16 - 7 n) 

+4q 3 (n-6) + 10q i ] +0(e 3 ), 



(103) 



tp 4 (n) = e 2 n (1 - 2qf (1 - q) 2{n -^ q 2 ^ 2 ^ [2- 4g(l - q) - n(l -2q)}+ 0(e 3 ). 



(104) 



Note that all e corrections nullify for q = i, once in this limit we should get a memory-less process. These equations 
produce for the entropy from (|100l |4*3"| : 



-(1 - q) ln(l -q)-q\nq 
2e (1 - 2g) In ' 1 q 



q 



2e 2 (1 - 2q) 



In 



1-9 



1 - 2q 
4(1 -q) 2 q 2 



0(e 3 ). 



(105) 
(106) 

(107) 
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Eq. (|105[) is just the Markov entropy j[73j) obtained in the limit e = 0. Eqs. (| 1 06[) is the first correction to the 
Markov situation; it is obtained in jlH |13|. The second correction (j!07p is reported in [l^]. The authors of [l5[ also 
obtain the higher-order corrections employing the mapping of the binary symmetric HMP to the one-dimensional 
Ising model. These higher-order correction can be also obtained within the present method. Thus we demonstrated 
that the small-noise (quasi-Markov) situation can be adequately explored with the present method. 

In addition we obtain the small-noise expressions ()10Hll04|l for the zeta-function. This result is new and it allows 
to find the moment-generating function, which contains more information than the entropy, e.g., (|100H104"]) can be 
used for approximating the rate functions ([57)1 and (|3D[) . In particular, for the generating function we get from (T4"T]) 
and flOlHTOlj) 

a / \ „, n s„ £ n(l-2 g )[(l- g ) 2 y-(l- g )V»] 2 ^ 

A(n)=q +(l-q) w r^jr . , + O (e . 108 

q 2 (l-qy[(l-q) n + q n \ 



IX. SUMMARY. 



In this paper we studied the entropy and the moment-generating function of Hidden Markov Processes (HMP). 
The fact that these processes model non-Markov memory is at the origin of their numerous applications, and, simul- 
taneously, the main reason of difficulties in characterizing their entropy and the moment-generating function. Recall 
that the entropy gives the number of sequences in the typical set of the random process [g, Q ; the typical set is the 
smallest set of realizations with the overall probability close to one. Alternatively, the entropy is the uncertainty [per 
time-unit] of the process given its long history. The generating function allows to estimate the [small] probability of 
atypical sequences via the Chernoff bound and the rate functions [6 J5 ] . The entropy of HMP was studied via upper 
and lower bounds @, [lC L expansions over small parameters [l5l. fl6l 17|. and via expressing the entropy as a solution 
of an integral equation[3, H, [H HE GJ, El • 

Here we proposed to calculate the entropy and the moment-generating function of HMP via the cycle expansion 
of the zeta-function, a method adopted from the theory of dynamical systems [H [13, HU . I show that this method 
has two basic advantages. First, it produces exact results, both for the entropy and the moment-generating function, 
for a class of HMP. We did not so far got into any systematic way of searching for the exact solutions within this 
method. The examples of exact solutions presented in section IVII El were obtained in the most straightforward way. 
Second, even if no exact solution is found, the method offers an expansion for the entropy and the moment-generating 
function via an exponentially convergent power series (25l . l27l . [2^ | . Cutting off these expansions at some finite order 
gives normally an improvable approximation for the sought quantities, especially since there are qualitative estimates 
for the convergence radius of the series. This was demonstrated in section [Villi 

As a by-product of this study, we conjectured in section IVII CI on tentative conditions under which HMP reduces 
to a finite-order Markov process. These conditions compare favorably with those existing in literature, see e.g. [34j . 
and they deserve further exploration. We also conjectured relations (|921[94|) between the rate functions of the random 
process and its entropy. 
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APPENDIX A: RECOLLECTION OF SOME FACTS ABOUT THE EIGEN- REPRESENTATION VERSUS 

SINGULAR VALUE DECOMPOSITION. 

A matrix A can be diagonalized if [l8[ 

A = VDV-\ (Al) 

where D is a diagonal matrix, and where V is an arbitrary invertible matrix. Writing the eigen-resolution of D, 
D = J2k a k\<^k)(a k \, where (a k \a n ) = S kn , one gets 

A = J2<Xk\Rk)(L k \, (A2) 

k 

where a k are the eigenvalues of A (i.e., the solutions of dot (A — a 1) = 0), and where \R k ) and \L k ) are, respectively, 
the right and left eigenvectors: 

A\R k ) =a k \R k ), (L k \A = a k (L k \, (L k \R n ) = S kn . (A3) 

Note that in general (L k \L n ) ^ S kn . The right and left eigenvectors coincide for normal matrices [A^] — (A 
commutes with its complex conjugate). For those matrices V is unitary. 

Not every matrix can be diagonalized, a necessary and sufficient condition for this is that for each eigenvalue the 
algebraic degeneracy (i.e., degeneracy of this eigenvalue as the root of the characteristic polynom) coincides with the 
geometric degeneracy (the number of eigenvectors corresponding to this eigenvalue; geometric degeneracy cannot be 
larger than the algebraic one). Thus a sufficient condition for a matrix to be diagonalizable is that its eigenvalues 
are not degenerate. Here is a more general sufficient condition: Any matrix that commutes with a matrix with 
non-degenerate eigenvalues, is diagonalizable [L3| . 

If for one eigenvalue a of A the algebraic and geometric degeneracies are equal (say to m) , then 

A = vl a/ ™ & J y-\ (A4) 
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where I m xm is the m x m unit matrix. 

An alternative representation for the matrix A is given by the singular value decomposition. Note that if det A =/= 0, 
the matrix .AfA^A] -1 / 2 is unitary. Then it holds 

A = [/[AU] 1 / 2 , (A5) 

where U is unitary. Eq. (|A5[) holds also for det A = via the continuity. Going to the eigen-resolution of the hermitian 
matrix A^A, we see that for any matrix A there is a singular value decomposition: 

A = J2 a k\u k )(v k \, (A6) 

k 

A\v k ) = a k \u k ), (vk\v n )=5kn (A7) 
(u k \A = a k (vk\, (uk\u n ) = Skn, (A8) 

where a k (singular values of A) is the common eigenvalue spectrum of V AAt and V At A. 

For a given diagonalizable matrix A, its singular value decomposition is related to the eigen-resolution via [l8j 

(v n \Rk)<J n = a k (u n \R k ), (A9) 
(u n \L k )a n = a* k (v n \L k ). (A10) 

The matrix A is normal if and only if \a k \ = a k . (I did not find any standard reference on the fact that \a k \ = o k 
leads to normality; the proof I got myself is too tedious to be presented here). 

Singular values and eigenvalues arc related via the Weyl inequalities. For a given matrix A, order the absolute values 
of its eigenvalues as Iq > l\ > ... > l n , and order its singular values as ao > <J\ > ... > a n . The Weyl inequalities then 
read: 



n <T n i k, n - n '•»-*» ( au ) 

fc=0 k=0 k=0 k=0 

m m 

E CT f>E z - p>°- ( A12 ) 

fc=0 fc=0 

For n — m, (|A1 1|) leads to equality: rifc=o ° k = Ofc=o 

APPENDIX B: ADDITIONAL FEATURES OF THE ENTROPY. 

Recall the definitions (fTTJ) and (TH)|) of the entropy h and the block entropy H(N) = H(Xn, X\), respectively, for 
the stationary process X . Define: 

h(N) = H(N) — H(N - 1) = H(Xn\Xn-i, — , Xi). (Bl) 

h(N) [sometimes called innovation entropy] is the uncertainty of Xn given its history <Yjv-i, ...,Xi. It is clear that 
once limjv-^oo H< ^^ exists, h(N) converges to the source entropy for N — > oo. One can show that Q 

H ^ > h(N) > h(N + 1) > ft. (B2) 

To derive the second inequality in (|B2jl note that the stationarity and the entropy reduction due to conditioning imply 
h{N) = H(X N \X N - 1 ,...,X 1 ) - H(X N+1 \X N ,...,X 2 ) > H(X N+1 \X N ,...,Xt) = h(N+l). (B3) 
The first inequality in (|B2j) is shown as follows. 

H(N) 1 1 N 1 N 

-^i = jjH{Xi) + -Y,H(X i \X i - 1 ,...,X 1 ) > - Y^K^n^N-i,-,*!) - h(N), (B4) 

i=2 i=l 

where the first equality is the obvious chain rule for the conditional information, while the second inequality in (|B4|) 
follows from the stationarity H(X\) = H(Xn), and then from the same reasoning as in (|B3[) . The last inequality in 
(IB2I) is now obvious. 
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The meaning of > h = limjv^oo is that taking into account all the correlations decreases the entropy. 

In a related context, h(N) > h(N — 1) means that the innovations decrease under accumulation of experience. This 
inequality can be employed for putting an upper bound for H(N + 1) in terms of H(N) and H (N — 1): 

2H(N)-H(N-1)>H(N + 1)>H(N). (B5) 

Note also that H(N + 1) = H(N) + h(N + 1) < H(N) + g j^ + 1 1) leads to 

HiN + T) HiN) 
N+l - N ' v ; 

i.e., the uncertainty per step decreases when increasing N. 



APPENDIX C: ERGODIC FEATURES OF THE SINGULAR VALUES FOR A RANDOM MATRIX 

PRODUCT. 

Let us recall some important features of the Lyapunov exponents of the random matrix product |8[). Employ the 
known relation between the singular values of AB versus those of A and B [l8| 

ra m 

l[a k [AB}<l[a k [A}a k [B], (CI) 

k=0 k=0 

where < m < L — 1, and where the ordering (fT5|) is assumed: ao[A] > ai[A] > 

Now recall definitions $\\T($. Applying JUIJ with m = to T(xjv...i) we get (M < N) 

lna [T(x^...i)] < lna [T(x M -i...i)] + \na [F(x N ... M )]. (C2) 

Thus, lncro[T(xjv...i)] is sub-additive. Together with the assumptions i), ii) and in) of section HV Al Eq. ()C2[) ensures 
the applicability of the sub-additive ergodic theorem [ijj [2(|. This leads (for N — > oo) to the probability-one 
convergence (|24|) : 

- -^lno- fc [T(xjv...i)] -» A*)t, (C3) 

for fe = 0. Applying in the same way (|Clj) with m = 1 to T(xjv...i), we use the sub-additivity for 
In (CTo[T(xjv...i)]ci[T(xjv...i)]), deduce (|24p for = 1, and so on. It is clear that we could not employ the sub- 
additivity directly for lk[T(x]y i)] (modules of the eigenvalues), since they in general do not satisfy to anything like 

The sub-additive ergodic theorem is related to the additive (Birkhoff-Khinchin) ergodic theorem that claims the 
existence (with probability one) of a similar limit for a function X^fcLi /[^fe] °f the stationary random process 
•V {.V; ,l'v....} m. 



APPENDIX D: EIGENVALUES AND SINGULAR VALUES OF THE RANDOM MATRIX PRODUCT. 

Recall section TlV Bl and the main question posed there: when the modules of the eigenvalues of the matrix product 
T(xjv...i) are equal, for N 3> 1, to the singular values of T(xat...i). 

As shown by (|25[) . for N 3> 1 we can keep the dependence on N only in the singular values of T. (We simplified 
notations as T(xjv...i) = T.) First assume that T is a 2 x 2 matrix. Write the singular value decomposition (|A5j) for 

T as 

/ e- N ^° \ (a b\ 

T = \U, U=\ , (Dl) 

\ e- N ^ J \ c d ) 

where e~ Ntl0 and e~ Nf±1 [with /io < /J,±] are the singular values of T, and where the matrix U can be taken real, since 
T is real. Thus U is orthogonal: ab + cd = 0, a 2 + c 2 — b 2 + d 2 = 1, ad — be — ±1. 
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For the modules of the eigenvalues of T in (|D1[) one finds 



a e 



|6c| - 



h 



-JV M i 



2[bcl c -jy(2 Ml - w ) + 



(D2) 



If | a | 7^ 0, the singular values of T coincide with the absolute values of its eigenvalues for N 1 [23[: the terms 
0(e _Jv ^ 1_A1 °)) and (^(e - ^ 2 '* 1- ^ )) are negligible and In \a\ is also neglected inside of the exponents as compared to 
N/J.Q and Nfii. 

This conclusion changes for a — (and thus d = since [/ is orthogonal). Now the modules of the eigenvalues 
coincide with each other and are equal to e _Ar ^' 1+A12 ^ 2 which is different from the singular values. 
The next example is 3 x 3 matrix T with the determinant equal to zero: 



/ e^ \ 



T 



V o 



0/ 



U. 



u 



c d f 



\x y z j 



(D3) 



where e~ N ^° and e~ NfJ ' 1 [with /iq < fjL\\ are two non-zero singular values of T, and where the matrix U is orthogonal. 
Note that provided the third Lyapunov exponent /X2 is larger than \i\ (and provided we do not use the orthogonality 
features of the matrix U in (|D3[0 . the considered example is sufficiently general. 

Since dctT = 0, the third singular value of T is zero. The third eigenvalue of T(xjv...i) is also equal to zero, while 
for the absolute values of the remaining eigenvalues we have from (|D3[) 



h = \a\e 



-Nno 



-ATCjui-mo)^ l t = \ ad fe l c -A f Mi + O ( e -JV(2| tl - M0 )\ 



(D4) 



If \ad — bc\ ^ 0, the singular values e~ N ^° and e~ NfJl1 coincide [for N ^S> 1] with the modules of the eigenvalues. For 
\ad — bc\ = the second eigenvalue of T is equal to zero, while the second singular value is non-zero. However, the 
first Lyapunov exponent is still equal to the spectral radius (module of the first eigenvalue) if a ^ 0. The latter two 
quantities are not equal for a = 0. Now the modules of both eigenvalues of T(xjv...i) reduce to ^J\bc\e~ N ( ^^1+ ^ 12 ^ 2 . 

Using the examples (jDll ID3|) we got a sufficient condition for deciding whether the maximal singular value of T is 
equal to the module of the corresponding eigenvalue. It is that the absolute values of the two leading eigenvalues of 
T are different. 



APPENDIX E: ZETA-FUNCTION AND PERIODIC ORBIT EXPANSION. 



1. Structure of periodic orbits. 



Define formally 



z m = 4>{^ii-^i 



(El) 



where A\, ...,Am are matrices, and where </>[.] is a function that turn its matrix argument to a number. We assume 
that the following features hold for 4> (d is a positive integer): 



4>[A d ]=<f> d [Al <t,[AB]=^[BA]. 
Using these features one can prove for Z m the following formula [26j]: 

Zm = J2 J2 n[(l)[A 71 ...A 7n ]}" 

n\m (71 , . . . ,7 ri )GPer(7i) 



(E2) 



(E3) 



where J2 n \m means that the summation goes over all n that divide m, e.g., n = 1,2,4 for m = 4. Here Per(n) contains 
sequences 



r = (71, -i7n) 



(E4) 
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TABLE IV: The elements of Per(p) for p = 1, 5 and M = 2. As compared to © we denoted T(xi) = 1 and T(x 2 ) = 2. 
It is seen that Per(l) contains two elements, since the cyclic permutation is trivial. Per(2) contains a single element 12, since 
11 and 22 remain invariant under a single cyclic permutation, while BA is obtained from AB via a single cyclic permutation. 
Besides the obvious sequences 1111 and 2222, Per(4) does not include the sequences 1212 and 2121 which stay invariant after 
two successive cyclic permutations. In Per(5) we first meet different elements that have the same overall number of l's and 2's, 
e.g., 12121 and 11122. 



p 


Per (p) 


1 


1, 2 


2 


12 


3 


122, 211 


4 


1222, 2111, 1122 


5 


12222, 21111, 11222, 
22111, 12121, 21212 


6 


122222, 112222, 111222, 
111122, 111112, 
112212, 221121 
111212, 222121. 



TABLE V: The elements of Per(p) for p = 1, 4 and M = 3. 



p 


Per(p) 


1 


1,2,3 


2 


12, 13, 23 


3 


122, 211, 233 
322, 133, 311 
123, 132 


4 


1222, 2111, 1122, 
2333, 3222, 2233 
1333, 3111, 1133 
1123, 1132, 1213 
2213, 2231, 2321 
3312, 3321, 3231 



selected according to the following rules: i) T turns to itself after n successive cyclic permutations, but does not turn 
to itself after any smaller (than n) number of successive cyclic permutations; ii) if T is in Per(n), then Per(n) contains 
none of those n — 1 sequences obtained from T under n — 1 successive cyclic permutations. 

Assume that M — 2, which means that the matrices Ai can take two values A\ — 1 and A^ — 2. With examples of 
Per(n) given in Table ITVl the proof of (|E3|) is straightforward. 



2. The inverse zeta-function and derivation of Eq. (|44|) . 



The inverse zeta function is defined as = exp [— 2m=i ^T^m] , where Z m is given by (|E1|) . Employing (|E3 
and introducing notations p = n, q = — , we transform as 



£(z) = exp 

the summation over q in (|E5[) is taken as 



p=i rePcr(p) g=i 



(E5) 



Y,— (<t>[A n ...A rp ]) q = -ln[l-^[A 7l ...A 7 J] 

9=1 q 



(E6) 
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We shall then finally get [H,[26j]: 

OO 

p=l TGPer(p) 



3. How to generate the elements of Per(p) via Mathematica 5. 

The elements of Per(p) presented in Tables HVl and fVl were generated by hands. For larger p it is more convenient 
to generate these elements via Mathematica 5. Below we assume that the reader knows Mathematica at some average 
level. First one should run the package of combinatoric functions: 

«DiscreteMath I Combinatorica < (E8) 

Next one defines the function ListNecklaces2 [c_List , n_Integer?Positive] [331 ] . the first argument of which is a 
list, e.g., { A,B } , while the second argument is a positive integer. 

AllCombinations [x_List , n_Integer?NonNegative] 

:= Flatten [Outer [List , Sequence Table [x, {n}]], n - 1] ; 
ListNecklaces2 [c_List , n_Integer?Positive] := Module [{}, 

Return [OrbitRepresentatives [CyclicGroup [n] , AllCombinations [c , n] ] ] ] ; (E9) 

The definition of ListNecklaces2 proceeds via an auxiliary function AllCombinations. All other functions in (|E9[) 
are contained in the package (|E8|) . 

Upon running ListNecklaces2 [c , p] one gets the elements of Per(p) together with those sequences ('ji, ...,j p ) 
that remain invariant under p successive cyclic permutation, where p/p is an integer. For our purposes we meed only 
the sequences which are invariant with respect to p cyclic permutation, and are not variant with respect to cyclic 
permutations with any smaller p. So our next task is to get rid of those parasitic sequences, which stay invariant with 
respect to p cyclic permutations with p < p. To this end we designed a straightforward Mathematica program that by 
the direct enumeration detects and eliminates the parasitic sequences [obviously, nothing special has to be done for 
simple numbers like p = 3,5, 7, 11, 13]. The drawback of this program is that for each p in Per(p) one has to adjust 
the details of this program. Anyhow, we were not able to enforce Mathematica 5 to generate the elements of Per(p) 
directly. 

Here is an example of the above scheme: ListNecklaces2[{A,B}, 3] generates a list of lists: 

{ { A, A, A }, { A, A,B} , { A,B,B} , { B,B,B} }. (E10) 
After elimination of the parasitic sequences this results in 

Y = { { A, A,B} , { A,B,B} }, (Ell) 
where we introduced a shorthand Y. Now employing the construction 

Apply [Times, Map [ f [#] &, Apply [Dot , Y, 1] ] ] , (E12) 
where f is an arbitrary function, one gets 

f [A.A.B] f [A . B . B] . (E13) 



The construction (|E12p is useful when recovering the formulas for 4>k for large values of p. 



